Building blocks, workflows, and agents

Building Effective Agents Cookbook と合わせて

In this section, we’ll explore the common patterns for agentic systems we’ve seen in production.

Building block: The augmented LLM

an LLM enhanced with augmentations such as retrieval, tools, and memory.

We recommend focusing on two key aspects of the implementation: tailoring these capabilities to your specific use case and ensuring they provide an easy, well-documented interface for your LLM.

MCP

For Client Developers (MCP Quickstart)

Workflow: Prompt chaining

Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one.

When to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks.

有用な例

Generating Marketing copy, then translating it into a different language.

Writing an outline of a document, checking that the outline meets certain criteria, then writing the document based on the outline.

Workflow: Routing

Routing classifies an input and directs it to a specialized followup task.

図で「LLM Call Router」が振り分けている

This workflow allows for separation of concerns, and building more specialized prompts.

When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.

有用な例

Directing different types of customer service queries (general questions, refund requests, technical support) into different downstream processes, prompts, and tools.

Routing easy/common questions to smaller models like Claude 3.5 Haiku and hard/unusual questions to more capable models like Claude 3.5 Sonnet to optimize cost and speed.

Workflow: Parallelization

This workflow, parallelization, manifests in two key variations:

Sectioning: Breaking a task into independent subtasks run in parallel.

Voting: Running the same task multiple times to get diverse outputs.

図の例

IMO：Self-Consistency Improves Chain of Thought Reasoning in Language Modelsっぽいと思ったが、こちらはプロンプトを少しずつ変えるらしい

When to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results.

有用な例

Sectioning

Implementing guardrails where one model instance processes user queries while another screens them for inappropriate content or requests.

Automating evals for evaluating LLM performance, where each LLM call evaluates a different aspect of the model’s performance on a given prompt.

Voting

Reviewing a piece of code for vulnerabilities, where several different prompts review and flag the code if they find a problem.

Evaluating whether a given piece of content is inappropriate, with multiple prompts evaluating different aspects or requiring different vote thresholds to balance false positives and negatives.

Workflow: Orchestrator-workers

a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.

When to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed (in coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task).

the key difference from parallelization is its flexibility—subtasks aren't pre-defined, but determined by the orchestrator based on the specific input.

有用な例

Coding products that make complex changes to multiple files each time.

Search tasks that involve gathering and analyzing information from multiple sources for possible relevant information.

IMO：gpt-researcher

Workflow: Evaluator-optimizer

one LLM call generates a response while another provides evaluation and feedback in a loop.

図では、GeneratorとEvaluatorの間でloopしている

When to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value.

有用な例

Literary translation where there are nuances that the translator LLM might not capture initially, but where an evaluator LLM can provide useful critiques.

Complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information, where the evaluator decides whether further searches are warranted.

Agents

Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement.

During execution, it's crucial for the agents to gain “ground truth” from the environment at each step (such as tool call results or code execution) to assess its progress.

Agents can handle sophisticated tasks, but their implementation is often straightforward.

They are typically just LLMs using tools based on environmental feedback in a loop.

Appendix 2

When to use agents: Agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path.

The autonomous nature of agents means higher costs, and the potential for compounding errors. We recommend extensive testing in sandboxed environments, along with the appropriate guardrails.

有用な例

A coding Agent to resolve SWE-bench tasks

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet

“computer use” reference implementation

Computer use (beta) (Claude)